Chapter 9 Multivariate models and predictive accuracy
In this chapter we’re going to consider the geometry of models with multiple continuous predictors. We’re also going to talk about the predictive accuracy of our models, which can help us understand how well our models represent our data.
9.1 Data and research questions
We’re still working with the perceptual data we used in Chapters 7 and 8, described in detail at the beginning of those chapters. We load the data below:
# data setup
#################################################
library (plotly)
library (brms)
options (contrasts = c("contr.sum","cont.sum"))
# source data
url1 = "https://raw.githubusercontent.com/santiagobarreda"
url2 = "/stats-class/master/data/h95_experiment_data.csv"
h95 = read.csv (url(paste0 (url1, url2)))
# set up colors for plotting
devtools::source_url (paste0 (url1, "/stats-class/master/data/colors.R"))
# source functions
devtools::source_url (paste0 (url1, "/stats-class/master/data/functions.R"))We’re going to consider two parallel models investigating the perception of speaker height and adultness from speech acoustics. The models include the following variables:
padult: a dichotomous variable, ‘a’=adult response, ‘c’=child response.pheight: a continuous variable indicating perceived height in inches (or it’s scaled counterpartpheight_s).pgender: a factor indicating whether subjects indicated hearing female (pgender=f) or male speaker (-pgender=m).g0: log-f0 (or it’s scaled counterpartg0_s), a continuous predictor.gbar: \(\bar{G}\), log-mean formant-frequency (or it’s scaled counterpartgbar_s), a continuous predictor.vowel: a factor with two levels,vowel1=a,-vowel1=i.subj: a factor indicating subject/listener (n=10).speaker: a factor indicating speaker (n=139).
To this point we’ve only discussed the fundamental frequency (f0) of vowel sounds, related to voice pitch. Although this is a simpler variable to conceive of, it turns out that it’s not really the best predictor of the perception of many salient indexical characteristics such as height, age and gender.
In my work on the perception of apparent speaker characteristics, I have repeatedly found that the formant frequencies influence speaker judgments much more than f0. A brief explanation of the relationship between formant frequencies and size goes:
On average taller people have longer vocal tracts.
People with longer vocal tracts produce lower resonance frequencies (i.e. formants). Think of the difference between a trumpet and a tube (longer = lower frequency).
If we measure the formants of a vowel sound and take the average, we get a reasonable estimate of a person’s vocal tract length and height (see point 1).
Listeners appear to use this information to asses the height and age of speakers.
To measure the average formant frequency, we use the average log-transformed formant frequency across both productions available for each speaker (denoted by \(\bar{G}\)). Again I won’t get into the details of why, but it turns out that using the logarithm of the frequency has certain desirable characteristics for our linear models (explained in painstaking detail here).
Ok, so all of the above is just to justify the inclusion of this second acoustic predictor. In the figure below we can see that f0 is obviously correlated with perceived height. The figure includes Pearson’s correlation coefficient calculated between the two variables. This is a value betwen -1 and 1 indicating how much of a perfect line the two variables make when plotted together.
The correlations calculated below are not really ‘valid’ for repeated measures data like what we have. However they do help us confirm what the figures below show: It seems that \(\bar{G}\) has a more linear relationship to perceived height then f0. Further, the two predictors are highly correlated overall. This makes sense since across the human population, shorter people (with higher formants) generally also have higher pitches (e.g. there are no low-pitched babies).
Figure 9.1: (left) . (mid-left) . (mid-right) . (right) .
The location of the points in the 2-dimensional plots is defined by the value of the two variables defining the dimensions. Each point is actually associated with 3 continuous variables (f0, \(\bar{G}\), and perceived height), meaning that each point is actually associated with a specific location in a 3-dimensional space.
Imagine you’re holding a clear plastic square containing points arranged like in the plots above. Looking ‘through’ the side of the square through different directions would result in you seeing arrangements just like the plots above. In the 2d plots above I can only show you static images of 3 of the faces of this cube. However the interactive plot below can be rotated such that when you look ‘down’ certain axes you will see plots that look just like the ones above.
Figure 9.2: (left) . (mid-left) . (mid-right) . (right) .
When considered in three dimensions, rather than forming a line, these points can form a plane. For example, the points above almost look like stairs, or a bridge, going from smaller to larger perceived heights.
A Regressions model that includes two continuous predictors tries to find the ‘best’ plane that passes through your points in the three-dimensional space. ‘Best’ is defined as the one that minimizes the distance from each point to the surface of the plane along the axis defined by the dependent variable. Though it’s a bit more complicated than this for multilevel models, this is still basically what’s happening.
The figure above also shows the ‘best’ plane going through our points, with and without our data. Just like our slope coefficients changed the slope of our lines, our slope coefficients now change the slope of our planes. Since the plane has two dimensions it has two slopes: a field can be downhill away from you, but also up/downhill left to right. Our intercept coefficients will change the intercepts of our planes, sliding entire planes up/down without changing their slopes.
9.2 Using multiple continuous predictors
We’re going to begin by modeling perceived height as a function of f0 and \(\bar{G}\), except we’re going to include all of the other complexity we’ve been building up so far, and some new predictors. Now that we can use Bayesian ANONA to interpret our models, using so many predictors isn’t as daunting as it might otherwise be.
9.2.1 Describing and fitting the model
Our model formula is going to be as below. Note that padult and pgender interact with our continuous predictors, but vowel does not. This is because I think it’s reasonable that perceived gender or adultness affects the use of acoustic cues. However, I think vowel category will probably not have such a complicated effect and so it is included only as an intercept.
pheight ~ (g0_s + gbar_s) * padult * pgender + vowel +
((g0_s + gbar_s) * padult * pgender + vowel | subj) +
(1 | speaker)
Which in plain English mains:
“We’re predicting perceived height using two continuous predictors (
g0_sandgbar_s). The slopes of these planes are allowed to vary according topadult,pgender, and the interaction of the two. Our model includes intercept shifts for these plains according topadult,pgender, the interaction of the two, andvowel. All of the aforementioned effects are included as ‘random’ by-subject effects, and the model includes random by-speaker intercepts as well.”
Below, I set up the necessary variables in our data:
# standardize log f0
h95$g0_s = (h95$g0-mean(h95$g0)) / sd(h95$g0)
# standardize log geometric mean formant frequency
h95$gbar_s = (h95$gbar-mean(h95$gbar)) / sd(h95$gbar)
# create gender and adultness variables
h95$pgender = c('m','w')[h95$pgroup %in% c('g','w')+1]
h95$padult = c('c','a')[h95$pgroup %in% c('m','w')+1]And fit our model.
library (brms)
options (contrasts = c("contr.sum","cont.sum"))
set.seed (1)
height_perception_multi =
brms::brm (pheight ~ (g0_s + gbar_s)*padult*pgender+vowel +
((g0_s + gbar_s)*padult*pgender+vowel|subj) +
(1|speaker),
data=h95, chains=4, cores=4, warmup=1000, iter = 6000, thin = 4,
control = list(adapt_delta = 0.95),
prior = c(set_prior("student_t(60, 0, 12)", class = "Intercept"),
set_prior("student_t(3, 0, 6)", class = "b"),
set_prior("student_t(3, 0, 6)", class = "sd"),
set_prior("lkj_corr_cholesky (2)", class = "cor")))
# save model
# saveRDS (height_perception_multi, '9_height_perception_multi.RDS')9.2.2 Interpreting the model
I’m not going to show the model print statement because it’s several pages long, although I suggest you have a look at it. Instead, we are going to print just the fixed effects and calculate a Bayesian ANOVA.
fixefs = fixef (height_perception_multi)
fixefs
## Estimate Est.Error Q2.5 Q97.5
## Intercept 61.4316520 0.57274665 60.3026838 62.58137427
## g0_s -1.5139547 0.34844231 -2.1950206 -0.82717675
## gbar_s -3.0226344 0.37068405 -3.7583257 -2.29727790
## padult1 3.4151184 0.52362667 2.3514839 4.46738164
## pgender1 -0.9663628 0.33552523 -1.6360612 -0.29110462
## vowel1 -0.2323974 0.06844473 -0.3676769 -0.09972511
## g0_s:padult1 0.1087725 0.24090763 -0.3737782 0.58894607
## gbar_s:padult1 0.8556476 0.28387466 0.2844266 1.43293158
## g0_s:pgender1 -0.3373060 0.19375078 -0.7131054 0.04554423
## gbar_s:pgender1 0.2663032 0.18770930 -0.1095363 0.63366910
## padult1:pgender1 0.3061490 0.32586586 -0.3224177 0.97448355
## g0_s:padult1:pgender1 -0.0951291 0.21138089 -0.5138048 0.30520675
## gbar_s:padult1:pgender1 -0.5100064 0.19986603 -0.8978581 -0.11959186
baov = banova(height_perception_multi)We then use this information to make an ANOVA plot to see what the major sources of variation are in this dataset. Both acoustic predictors have a strong influence on the perception of height. In addition, the perception of adultness matters a lot, and there appears to be a non-zero adultness by \(\bar{G}\) interaction. There also appears to be a small effect for the perception of gender, an even smaller effect for vowel category, and perhaps some other small interactions.
par (mar = c(4.2,13,1,1))
banovaplot (baov[-2,], las = 2, xlab="Effect (inches)", horizontal = FALSE,
xaxs = 'i', xlim = c(0,5))
Figure 9.3: A Bayesian ANOVA plot of our model. These sorts of plots are described in chapter 8.
In other chapters I talk about how to recover and interpret specific parameters, and all of the same methods can be applied here. I am going to interpret only a subset of the possible parameter combinations, mostly as they relate to useful examples. The interpretation is going to focus on the geometry of the planes suggested by our model.
First we can consider the effect of the intercept shift on the plains if we constrain all the planes to have the same slope. On the left, we see one overall plane for all points. In the middle we have two planes, the uppoer one for adults and the lower one for children. On the right, there are four planes: the upper and lower plains have both been split into male (upper) and female (lower) planes within each age group. I didn’t make a plot of the interaction between perceived adultness and perceived gender, but if I had it would manifest as a difference in the spacing between the lower and upper pairs of planes on the rightmost plot!
Figure 9.4: (top) . (middle) . (bottom) .
The planes above correspond to equations that would look something like below. Of course, the notation below is neither a model formula nor a prediction equation so it kind of doesn’t make sense. However, I think it gets the general point across: the planes above all have the same slopes and just differ in the application of the padult and pgender intercept shifts.
Left:
pheight = (g0_s + gbar_s)
Middle:
pheight = (g0_s + gbar_s) + padult1
pheight = (g0_s + gbar_s) - padult1
Right:
pheight = (g0_s + gbar_s) + padult1 + pgender1
pheight = (g0_s + gbar_s) + padult1 - pgender1
pheight = (g0_s + gbar_s) - padult1 + pgender1
pheight = (g0_s + gbar_s) - padult1 - pgender1
Now, we will consider different planes constrained to have the same intercept. On the left, we see one overall plane for all points. In the middle we have two planes showing a difference in slope along \(\bar{G}\) based on perceived adultness, the plane with the more negative slope corresponding to perceived children. On the right, In the middle we have two planes showing a difference in slope along \(\bar{G}\) based on perceived adultness, the plane with the more negative slope corresponding to perceived children.
Figure 9.5: (top) . (middle) . (bottom) .
The planes above correspond to equations that would look something like below. Again, I am using this weird pseudo-formulas that I hope make sense. The main idea now is: the planes above all have the same intercept and only differ in their slopes along each dimension based on padult.
Left:
pheight = g0_s + gbar_s
Middle:
pheight = g0_s + (gbar_s + gbar_s:padult1)
pheight = g0_s + (gbar_s - gbar_s:padult1)
Right:
pheight = (g0_s + g0_s:padult1) + gbar_s
pheight = (g0_s - g0_s:padult1) + gbar_s
Below, I show a final set of planes containing might be a ‘final’ model that captures most of the important variation in the data without over-complicating things. The planes below reflect a fixed-effects structure that reflects a model formula of:
pheight ~ g0_s + gbar_s*padult
In other words, the model predicts perceived height based on g0, \(\bar{G}\), and perceived adultness, with an interaction between \(\bar{G}\) and perceived adultness only.
Figure 9.6: (left) . (mid-left) . (mid-right) . (right) .
9.3 Considering predictive accuracy
We haven’t really talked about the accuracy with which our models predict our data very much, but we’re going to talk about this a bit here.
First, we’re going to predict the data using our model, which is easy using the predict function provided by brms. This function provides the posterior predictions (\(y_{rep}\), for replicated) generated by the model, which can be thought of in either of the following ways:
\[ \mu = Intercept + x_1 + x_2 + ... + x_n \\ y_{rep} = \mu + \sigma_{error} \\ \]
Above we see that the posterior predictions generated by our model consist of the linear prediction (\(\mu\)) combined with normally-distributed error (with a standard deviation based on our data). Below, we directly generate a normally-distributed random variable with a given mean and standard deviation parameter:
\[ \mu = Intercept + x_1 + x_2 + ... + x_n \\ y_{rep} \sim \mathrm{Normal} (\mu,\ \sigma_{error}) \]
It’s important to note that the posterior predictions contain our predicted values plus random error. This increases the variation we can expect across individual samples. However, these will tend to be equal whether one only considers the average value of \(y_{rep}\) or \(\mu\) for a given data point, across all posterior samples.
If we look at the output of the function we can see that it is a matrix with as many rows as we have observations. For each observation it provides the mean posterior prediction, and a range of credible values.
y_pred_full = predict (height_perception_multi)
str (y_pred_full)
## num [1:2780, 1:4] 53.1 53.4 51.3 53.4 63.7 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
head (y_pred_full)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 53.11212 2.705120 47.90816 58.48499
## [2,] 53.35546 2.733928 48.03373 58.71722
## [3,] 51.27269 2.707167 45.97102 56.67369
## [4,] 53.38857 2.799813 48.05878 59.00738
## [5,] 63.72664 2.764811 58.18700 69.08884
## [6,] 52.61264 2.715479 47.30708 57.88366Below we can use the predict function to investigate how accurately our model can predict our data. Using the re_formula parameter, we can make predictions using different random effects.
The reason we might want to omit random effects is because if our model has excellent predictions based only on its random effects, it may not generalize to the population in general. So, making sure that our models can accurately predict our data even when using only the ‘population level’ effects is an important aspect of checking model predictive accuracy.
# no random effects in predictions
y_pred_no_re = predict (height_perception_multi, re_formula = NA)
# only speaker random effects in predictions
y_pred_sp = predict (height_perception_multi, re_formula = ~(1|speaker))
# only subject random effects in predictions
y_pred_su =
predict (height_perception_multi,
re_formula = ~ ((g0_s + gbar_s) * padult * pgender + vowel | subj))
# subject and speaker random effects in predictions
y_pred_full =
predict (height_perception_multi,
re_formula = ~ ((g0_s + gbar_s) * padult * pgender + vowel | subj) +
(1|speaker))I subtract our actual data from our model predictions to calculate the residuals under different conditions. The residuals are the difference between the data we observed and the predictions made by our model for each data point.
prediction_error =
data.frame (no_re = y_pred_no_re[,1] - h95$pheight,
sp = y_pred_sp[,1] - h95$pheight,
su = y_pred_su[,1] - h95$pheight,
full = y_pred_full[,1] - h95$pheight)Below you can see histograms of each of the sets of residuals from above. It seems that the subject random effects are more important that the speaker random effects for predictions, though both matter.
Figure 9.7: Histogram of prediction error for predictions with no random-effects (No RE), speaker random-effects only (SP Only), subject random-effects only (Su Only), or full random effects (Full RE).
The are many ways to consider the ‘average’ error. Two of the most useful and common are the root-mean-squared error and the mean absolute error. Below I define simple functions that carry out each operation.
# mean absolute error
mae = function (x) mean (abs (x))
# root-mean-squared error
rms = function (x) sqrt (mean (x^2))The main difference between them is in how they ‘equalize’ negative and positive values. The rms function takes the squares of values, while the mae function uses the absolute value of each error. The practical effect of this difference is that rms error (i.e., error es defined by the rms function, the root-mean-squared error) will give more weight to larger errors.
For example consider the ‘average’ residual calculated for the two functions for this tiny set of errors: c(2, -1, 0, 0, 10). The mae function returns a value of 2.6, while the rms function returns a value of 4.6. This is because in squaring the residuals, the function substantially increases the influence of large deviations. The squares of our residuals are 4, 1, 0, 0, and 100. The difference between our largest value and smallest value has gone from 5 (10/2) to 25 (100/4) after squaring.
We can apply both functions column-wise to our residuals with the apply function.
apply (prediction_error, 2, rms)
## no_re sp su full
## 3.377059 3.283754 2.666586 2.550822
apply (prediction_error, 2, mae)
## no_re sp su full
## 2.642946 2.569479 2.038613 1.954712We can see that, as expected, the MAE is smaller than the RMS error. Which should we use? Which is better? These are just two of many ways to calculate average error or accuracy. The RMS error relates directly to our standard deviation parameter and so is useful in that sense. For many purposes however, the MAE may be closer to what people mean by ‘accurate’ (i.e., how close can we get on average?).
9.4 Handling outliers: t-distributed errors
The t distribution is a lot like the normal distribution except that it has a third parameter that determines how much it differs from the normal distribution: a \(\nu\) parameter can vary continuously from 1 to \(\infinty\). When this parameter is equal to 1, the t distribution is maximally different from the normal distribution. When this parameter is equal to \(infinity\), the two distributions are identical.
Below we see densities of three t distributions varying in \(\nu\). We see that when \(\nu\) is small, the distribution is pointy and has what are called “fat tails”. For this reason the t distribution is sometimes said to resemble a witch’s hat. On the other hand when \(\nu\) is large, the distribution has much less density in its tails and looks more like a normal distribution.
The real difference is in the probability with which these distributions tolerate rare events. The three distributions begin to diverge at around +- 2, a fact which comes across more clearly in the figure on the right below. The difference in density at -2 seems to be approximately a factor of 100 based on the \(\nu\) parameter. The difference in expectation is much larger than this at a value of 6.
Figure 9.8: (left) Densty of t distributions with nu parameters of 1 (red), 10 (green) and 1000 (blue) . (right) Same as left plot but y axis is logarithmic. Each horizontal line indicates a 1000-fold change in probabilities.
As a result of this, a t distribution with a small \(\nu\) parameter will predict roughly the same behavior as a normal distribution inside of 2 standard deviations of the mean, but be much more tolerant to outliers in the data. In practice, this means using a model that assumes t-distributed, rather than normal, residuals will be more robust to outliers and will not be influenced as much by extreme values.
Below, I compare histograms of random data generated using the t and normal distributions. These distributions have the same standard deviation parameter (1) and mean (0). When we focus on the bulk of the distributions (+- 2 standard deviations from the mean), the data looks quite similar. However, when we zoom out we see that our t distribution has generated quite a few data points that would be considered extreme outliers by the normal distribution. The only way for our normal distribution to generate values so far from the mean would be to have a substantially larger standard deviation.
Figure 9.9: (top row) Historgram of 10,000 random draws from a normal distribution. Left and right columns present same numbers, just a different x-axis range. Points along bottom indicate individual observations. (bottom row) Same as top row but data is drawn from a t distribution with a $ u$ parameter of 4.
Below we can see that the outliers present in the t-distributed data result in a 40% overestimation of the standard deviation of the distribution (1.4/1), and make it seem that the t-distributed data is much more spread out than the normally-distributed data. When we consider a more robust measure of the spread of the data (the interquartile range), the variables actually appear much more similar.
# compare standard deviations of variables
sd (x_norm)
## [1] 1.012356
sd (x_t)
## [1] 1.429296
# compare inter-quartile ranges of variables
diff ( quantile (x_norm, c(.25,.75)))
## 75%
## 1.351055
diff ( quantile (x_t, c(.25,.75)))
## 75%
## 1.473742One reason that we care about this for our linear models is that our estimate of \(\sigma_{error}\) affects the certainty in all of our parameter estimates. As a result, outliers that increase the estimate of \(\sigma_{error}\) spread uncertainty throughout the whole model. By using t-distributed errors we can potentially get better parameter estimates and not have to omit outliers from our data.
Below we use the residuals function to collect the residuals from the model we fit above. The residuals function gives you a posterior mean and credible intervals for your residuals (just like the predict function does for your predictions).
resids = residuals (height_perception_multi)
head (resids)## Estimate Est.Error Q2.5 Q97.5
## [1,] -1.82732543 0.6721898 -3.1178915 -0.4707440
## [2,] -2.46747336 0.7306226 -3.8611248 -1.0282123
## [3,] 0.95301794 0.6094575 -0.2423565 2.1408184
## [4,] 2.24925404 0.8582907 0.5669350 3.9378093
## [5,] -1.08832468 0.8496259 -2.7427952 0.5509476
## [6,] 0.07557465 0.6404586 -1.1794754 1.3211662
Below we can compare the density of the residuals with the densities of normal and t distributions with the same standard deviation. The logarithmic axis makes it easier to understand the behavior of the distributions in the tails. For instance, everything below the bottom horizontal line in the left plot (at 0.01 in the left panel and 01e-03 in the right panel) is very difficult to see without logarithmic y axes.
In the middle plot below we see that a normal distribution does a very good job of representing the residuals in the bulk of the distribution. However, our residuals have a lot more ‘density’ in values grater than 10 and less than -10 (i.e., in the ‘tails’). In the right panel we see that a normal distribution (with \(\nu = 5\)) offers a closer approximation of the overall shape of the distribution, even for extreme values.
Figure 9.10: (left) Histogram of residuals. (middle) Density of residuals (blue) compared to density of a normal distribution (red). The y axis is logarithmic: each horizontal line is a 100-fold increase in the probability of an event. (right) Density of residuals (blue) compared to density of a t-distribution (green) with a nu parameter of 10.
9.4.1 Fitting and interpreting the model
Fitting a model with t-distributed errors is very simple with brms, you just need to change the family parameter to family=student. When you do this, that tells brms that your model is now like this (for a simple regression model):
\[ \begin{equation} \begin{split} \textrm{Likelihood:} \\ y_{[i]} \sim \mathrm{student}(\nu, \mu_{[i]},\sigma_{error}) \\ \mu_{[i]} = a_{[i]} + b_{[i]} * \mathrm{x}_{[i]} \\ \end{split} \tag{9.1} \end{equation} \]
Notice that we are now drawing our data from a t distribution, and the distribution now also includes the \(\nu\) parameter. You don’t really need to worry about the prior for \(\nu\), the default behavior is a prior of "gamma(2, 0.1) and a fixed lower bound of 1. explained here. This will be good enough for most situations.
We’re going to fit a reduced version of the model above (mostly just to make our lives easier). We’re going to tell brms to use a t distribution by setting family = 'student'.
library (brms)
options (contrasts = c("contr.sum","cont.sum"))
set.seed (1)
height_perception_student =
brms::brm (pheight ~ g0_s + gbar_s * padult+pgender+vowel +
(g0_s + gbar_s * padult+pgender+vowel|subj) +
(1|speaker),
data=h95, chains=4, cores=4, warmup=1000, iter = 6000, thin = 4,
control = list(adapt_delta = 0.95), family = 'student',
prior = c(set_prior("student_t(60, 0, 12)", class = "Intercept"),
set_prior("student_t(3, 0, 6)", class = "b"),
set_prior("student_t(3, 0, 6)", class = "sd"),
set_prior("lkj_corr_cholesky (2)", class = "cor")))
# save model
saveRDS (height_perception_student, '9_height_perception_student.RDS')If we inspect the print statement we will see all of the usual information, with a new line for \(\nu\) (nu) in the ’Family Specific Parameters" section. We can see that our model suggests our residuals resemble a t distribution with 7 degrees of freedom, meaning it is not really close to normal.
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 2.27 0.06 2.15 2.38 1.00 5075 4753
nu 7.06 1.09 5.33 9.63 1.00 5261 4941
9.5 Plot Code
################################################################################
### Figure 9.1
################################################################################
h95$group = factor (h95$group)
h95$g0_s = (h95$g0-mean(h95$g0)) / sd(h95$g0)
h95$gbar_s = (h95$gbar-mean(h95$gbar)) / sd(h95$gbar)
agg_data = aggregate (cbind ( pheight, g0,g0_s,gbar,gbar_s) ~ speaker+group,
data = h95, FUN = median)
cols2 = c(deepgreen, darkorange, skyblue, lavender)
par (mfrow = c(1,3), mar = c(4,4,1,1), oma = c(1,1,0,0))
plot (agg_data$g0, agg_data$pheight, cex =2, col = cols2[c(1:4)][agg_data$group],
xlim=c(4.4,5.9), pch=16,lwd=2,ylim = c(45,75),ylab="Height (inches)",
xlab="log f0 (Hz)")
grid()
abline(lm (pheight ~ g0, data = agg_data)$coefficients, lwd=2,lty=3)
text (5.6,72, cex=1.1, label=paste0("r = ",
round(cor(agg_data$g0,agg_data$pheight),2)))
plot (agg_data$gbar, agg_data$pheight, cex =2, col = cols2[c(1:4)][agg_data$group],
xlim=c(7.3,7.7), pch=16,lwd=2,ylim = c(45,75),ylab="Height (inches)",
xlab="log GBAR (Hz)")
grid()
abline(lm (pheight ~ gbar, data = agg_data)$coefficients, lwd=2,lty=3)
text (7.6,72, cex=1.1, label=paste0("r = ",
round(cor(agg_data$gbar,agg_data$pheight),2)))
plot (agg_data$gbar, agg_data$g0, cex =2, col = cols2[c(1:4)][agg_data$group],
xlim=c(7.3,7.7), pch=16,lwd=2,ylim = c(4.4,5.9),ylab="lof f0 (Hz)",
xlab="log GBAR (Hz)")
grid()
abline(lm (g0 ~ gbar, data = agg_data)$coefficients, lwd=2,lty=3)
text (7.6,4.6, cex=1.1, label=paste0("r = ",
round(cor(agg_data$g0,agg_data$gbar),2)))
################################################################################
### Figure 9.2
################################################################################
normal_g0_gbar = readRDS ('../../models/9_height_perception_multi.RDS')
fixefs = brms::fixef (normal_g0_gbar)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy
zz0 = matrix (z,n,n)
fig3 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight,
#color = ~group, colors = c(coral,yellow,deepgreen,teal),
showlegend=TRUE)
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz0, showscale = FALSE,name="Overall",
colorscale = "Portland")
fig3 <- fig3 %>% add_trace(agg_data, color = ~group, type="scatter3d",
colors = c(coral,yellow,deepgreen,teal), mode="markers")
fig3 <- fig3 %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,1),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig3
################################################################################
### Figure 9.3
################################################################################
fixefs = fixef (height_perception_multi)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy
zz0 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy + fixefs[4]
zz1 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy - fixefs[4]
zz2 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy + fixefs[4] + fixefs[5]
zz3 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy + fixefs[4] - fixefs[5]
zz4 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy - fixefs[4] + fixefs[5]
zz5 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy - fixefs[4] - fixefs[5]
zz6 = matrix (z,n,n)
#################################################################
fig3 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight,
#color = ~group, colors = c(coral,yellow,deepgreen,teal),
showlegend=TRUE)
#fig3 <- fig3 %>% add_markers()
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz0, showscale = FALSE,name="Overall",
colorscale = list(c(0, 1), c("grey", "grey")))
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz1, showscale = FALSE,name="Adult",
colorscale = list(c(0, 1), c("blue", "blue")))
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz2, showscale = FALSE,name="Child",
colorscale = list(c(0, 1), c("red", "red")))
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz3, showscale = FALSE,name="b",
colorscale = "Portland")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz4, showscale = FALSE,name="g",
colorscale = "Portland")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz5, showscale = FALSE,name="m",
colorscale = "Portland")
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz6, showscale = FALSE,name="w",
colorscale = "Portland")
fig3 <- fig3 %>% add_trace(agg_data, color = ~group,
colors = c(coral,yellow,deepgreen,teal), mode="markers")
fig3 <- fig3 %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,1),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig3
################################################################################
### Figure 9.4
################################################################################
fixefs = fixef (height_perception_multi)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
z = fixefs[1] + fixefs[2]*xx + fixefs[3]*yy
zz0 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]+fixefs[8])*yy
zz1 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]-fixefs[8])*yy
zz2 = matrix (z,n,n)
z = fixefs[1] + (fixefs[2]+fixefs[8])*xx + (fixefs[3])*yy
zz3 = matrix (z,n,n)
z = fixefs[1] + (fixefs[2]-fixefs[8])*xx + (fixefs[3])*yy
zz4 = matrix (z,n,n)
#################################################################
fig3 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight,
#color = ~group, colors = c(coral,yellow,deepgreen,teal),
showlegend=TRUE)
#fig3 <- fig3 %>% add_markers()
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz0, showscale = FALSE,name="Overall",
colorscale = list(c(0, 1), c("grey", "grey")))
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz1, showscale = FALSE,name="Adult",
colorscale = list(c(0, 1), c("blue", "blue")))
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz2, showscale = FALSE,name="Child",
colorscale = list(c(0, 1), c("red", "red")))
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz3, showscale = FALSE,name="Adult",
colorscale = list(c(0, 1), c("blue", "blue")))
fig3 <- fig3 %>% add_surface (x=x,y=y,z = zz4, showscale = FALSE,name="Child",
colorscale = list(c(0, 1), c("red", "red")))
fig3 <- fig3 %>% add_trace(agg_data, color = ~group,
colors = c(coral,yellow,deepgreen,teal), mode="markers")
#fig3
fig <- subplot(fig3)
fig <- fig %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,1),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig
################################################################################
### Figure 9.5
################################################################################
fixefs = fixef (height_perception_multi)[,1]
n=100
x = seq (min(agg_data$g0_s),max(agg_data$g0_s), length.out=n)
y = seq (min(agg_data$gbar_s),max(agg_data$gbar_s), length.out=n)
z0 = seq (0,1,length.out=n)
xx = rep (x, each = n)
yy = rep (y, n)
#################################################################
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]+fixefs[8])*yy + fixefs[4]
zz1 = matrix (z,n,n)
z = fixefs[1] + fixefs[2]*xx + (fixefs[3]-fixefs[8])*yy - fixefs[4]
zz2 = matrix (z,n,n)
fig2 <- plot_ly(agg_data, x = ~g0_s, y = ~gbar_s, z = ~pheight, scene="scene2",
color = ~group, colors = c(coral,yellow,deepgreen,teal))
fig2 <- fig2 %>% add_markers()
fig2 <- fig2 %>% add_surface (x=x,y=y,z = zz1, showscale = FALSE,
colorscale = "Portland")
fig2 <- fig2 %>% add_surface (x=x,y=y,z = zz2, showscale = FALSE,
colorscale = "Portland")
fig2 <- fig2 %>% layout(title = "",showlegend = TRUE,
scene = list(domain=list(x=c(0,0.33),y=c(0,1)),
aspectmode='cube',
xaxis = list(title = 'log f0'),
yaxis = list(title = 'log mean FF'),
zaxis = list(title = 'Perceived Height (inches)')))
fig2
################################################################################
### Figure 9.6
################################################################################
par (mfrow = c(1,4), mar = c(3,1,3,1), oma = c(2,4,0,0))
hist (prediction_error[,1], breaks = 20, col = cols[2],main="No RE",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
hist (prediction_error[,2], breaks = 20, col = cols[1],main="Sp Only",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
hist (prediction_error[,3], breaks = 20, col = cols[6],main="Su Only",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
hist (prediction_error[,4], breaks = 20, col = cols[5],main="Full RE",
freq= FALSE, xlim = c(-15,15), xlab="",ylab="")
mtext (side=1, outer = TRUE, text="Prediction Error (Inches)", line= .7)
mtext (side=2, outer = TRUE, text="Density", line= 2)
################################################################################
### Figure 9.7
################################################################################
par (mfrow = c(1,2), mar = c(4,4,1,1))
cols2 = c(skyblue,coral,deepgreen)
nus = c(1,10,1000)
plot (seq (-6.5,6.5,.01), dt (seq (-6.5,6.5,.01),nus[1]), xlim = c(-5.7,5.7),
col = cols2[1], lwd=3, ylim = c(0,.45), type = 'l',ylab="Density",
xlab='x')
for (i in 2:6) curve (dt (x,nus[i]), add = TRUE, col = cols2[i],lwd=3,
xlim = c(-6.5,6.5))
grid()
plot (seq (-6.5,6.5,.01), dt (seq (-6.5,6.5,.01),nus[1]), xlim = c(-5.7,5.7),
col = cols2[1], lwd=3, ylim = c(0.0000001,.45), log='y', type = 'l',
ylab="Density",xlab="x")
for (i in 2:6) curve (dt (x,nus[i]), add = TRUE, col = cols2[i],lwd=3,
xlim = c(-6.5,6.5))
grid()
################################################################################
### Figure 9.8
################################################################################
par (mfrow = c(2,2), mar = c(2,2,1,1), oma = c(2,2,0,0))
set.seed(1)
x_norm = rnorm(10000,0,1)
x_t = rt(10000,4)
hist (x_norm, breaks = 50, xlim = c(-4,4),main="",col=teal,freq=FALSE,
xlab="",ylab= "")
points (x_norm,rep(-.005, length(x_norm)),pch=16,col=teal,cex=1)
hist (x_norm, breaks = 50, xlim = c(-12,12),main="",col=teal,freq=FALSE,
xlab="",ylab= "")
points (x_norm,rep(-.005, length(x_norm)),pch=16,col=teal,cex=1)
hist (x_t, breaks = 50, xlim = c(-4,4),main="",col=lavender,freq=FALSE,
xlab="",ylab= "")
points (x_t,rep(-.005,length(x_t)),pch=16,col=lavender,cex=1)
hist (x_t, breaks = 50, xlim = c(-12,12),main="",col=lavender,freq=FALSE,
xlab="",ylab= "")
points (x_t,rep(-.005,length(x_t)),pch=16,col=lavender,cex=1)
mtext (side = 2, outer = TRUE, text = "Density", line = 1)
################################################################################
### Figure 9.9
################################################################################
par (mfrow = c(1,3), mar = c(4,4,1,1))
hist (resids[,1], breaks = 20, main="", col =skyblue, freq = FALSE,
xlab = "Residuals")
abline (h = c(.1,.01), lty = 3, v = c(seq(-15,15,5)))
plot (density (resids[,1]), log='y', main = '', ylim = c(1e-07,1),
xlim = range(resids[,1]), lwd=3,col=skyblue,xlab="Residuals")
curve (dnorm (x, 0, sd(resids[,1])), add = TRUE, xlim = c(-20,20), lwd=3,
col=coral)
grid()
plot (density (resids[,1]), log='y', main = '', ylim = c(1e-07,1),
xlim = range(resids[,1]),col=skyblue,lwd=3,xlab="Residuals")
curve_dt = dt (seq(-6,6,.1), 10)
x2 = seq(-6,6,.1) * sd(resids[,1])
lines (x2, curve_dt, lwd=3,col=deepgreen)
grid()